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does not specify an Internet standard of any kind. Distribution of 
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Abstract 


This memo makes the case for interdependent TCP control blocks, where 
part of the TCP state is shared among similar concurrent connections, 
or across similar connection instances. TCP state includes a 
combination of parameters, such as connection state, current round- 
trip time estimates, congestion control information, and process 
information. This state is currently maintained on a per-connection 
basis in the TCP control block, but should be shared across 
connections to the same host. The goal is to improve transient 
transport performance, while maintaining backward-compatibility with 
existing implementations. 


This document is a product of the LSAM project at ISI. 


Introduction 


TCP is a connection-oriented reliable transport protocol layered over 
IP [9]. Each TCP connection maintains state, usually in a data 
structure called the TCP Control Block (TCB). The TCB contains 
information about the connection state, its associated local process, 
and feedback parameters about the connection’s transmission 
properties. As originally specified and usually implemented, the TCB 
is maintained on a per-connection basis. This document discusses the 
implications of that decision, and argues for an alternate 
implementation that shares some of this state across similar 
connection instances and among similar simultaneous connections. The 
resulting implementation can have better transient performance, 
especially for numerous short-lived and simultaneous connections, as 
often used in the World-Wide Web [1]. These changes affect only the 
TCB initialization, and so have no effect on the long-term behavior 
of TCP after a connection has been established. 


Touch Informational [Page 1] 


RFC 2140 TCP Control Block Interdependence April 1997 


The TCP Control Block (TCB) 


A TCB is associated with each connection, i.e., with each association 
of a pair of applications across the network. The TCB can be 
summarized as containing [9]: 


Local process state 


pointers to send and receive buffers 
pointers to retransmission queue and current segment 
pointers to Internet Protocol (IP) PCB 


Per-connection shared state 
macro-state 


connection state 

timers 

flags 

local and remote host numbers and ports 


micro-state 


send and receive window state (size*, current number) 
round-trip time and variance 

cong. window size* 

cong. window size threshold* 

max windows seen* 

MSS# 

round-trip time and variance# 


The per-connection information is shown as split into macro-state and 
micro-state, terminology borrowed from [5]. Macro-state describes the 
finite state machine; we include the endpoint numbers and components 
(timers, flags) used to help maintain that state. This includes the 
protocol for establishing and maintaining shared state about the 
connection. Micro-state describes the protocol after a connection has 
been established, to maintain the reliability and congestion control 
of the data transferred in the connection. 


We further distinguish two other classes of shared micro-state that 
are associated more with host-pairs than with application pairs. One 
class is clearly host-pair dependent (#, e.g., MSS, RIT), and the 
other is host-pair dependent in its aggregate (*, e.g., cong. window 
info., curr. window sizes). 
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TCB Interdependence 


The observation that some TCB state is host-pair specific rather than 
application-pair dependent is not new, and is a common engineering 
decision in layered protocol implementations. A discussion of sharing 
RTT information among protocols layered over IP, including UDP and 


TCP, occurred in [8]. T/TCP uses caches to maintain TCB information 
across instances, e.g., smoothed RIT, RTT variance, congestion 
avoidance threshold, and MSS [3]. These values are in addition to 


connection counts used by T/TCP to accelerate data delivery prior to 
the full three-way handshake during an OPEN. The goal is to aggregate 
TCB components where they reflect one association - that of the 
host-pair, rather than artificially separating those components by 
connection. 


At least one current T/TCP implementation saves the MSS and 
aggregates the RIT parameters across multiple connections, but omits 
caching the congestion window information [4], as originally 
specified in [2]. There may be other values that may be cached, such 
as current window size, to permit new connections full access to 
accumulated channel resources. 


We observe that there are two cases of TCB interdependence. Temporal 
sharing occurs when the TCB of an earlier (now CLOSED) connection to 
a host is used to initialize some parameters of a new connection to 
that same host. Ensemble sharing occurs when a currently active 
connection to a host is used to initialize another (concurrent) 
connection to that host. T/TCP documents considered the temporal 
case; we consider both. 


Example of Temporal Sharing 


Temporal sharing of cached TCB data has been implemented in the SunOS 
4.1.3 T/TCP extensions [4] and the FreeBSD port of same [7]. As 
mentioned before, only the MSS and RTT parameters are cached, as 
originally specified in [2]. Later discussion of T/TCP suggested 
including congestion control parameters in this cache [3]. 


The cache is accessed in two ways: it is read to initialize new TCBs, 
and written when more current per-host state is available. New TCBs 
are initialized as follows; snd_cwnd reuse is not yet implemented, 
although discussed in the T/TCP concepts [2]: 
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TEMPORAL SHARING - TCB Initialization 


Cached TCB New TCB 

old-MSS old-MSS 

old-RTT old-RTT 

old-RTITvar old-RTITvar 

old-snd_cwnd old-snd_cwnd (not yet impl.) 


Most cached TCB values are updated when a connection closes. An 
exception is MSS, which is updated whenever the MSS option is 
received in a TCP header. 


TEMPORAL SHARING - Cache Updates 


Cached TCB Current TCB when? New Cached TCB 

old-MSS curr-MSS MSSopt curr-MSS 
old-RTT curr-RTT CLOSE old += (curr - old) >> 2 
old-RTTvar curr-RTTvar CLOSE old += (curr - old) >> 2 
old-snd_cwnd curr-snd_cwnd CLOSE curr-snd_cwnd (not yet impl.) 


MSS caching is trivial; reported values are cached, and the most 
recent value is used. The cache is updated when the MSS option is 
received, so the cache always has the most recent MSS value from any 
connection. The cache is consulted only at connection establishment, 
and not otherwise updated, which means that MSS options do not affect 
current connections. The default MSS is never saved; only reported 
MSS values update the cache, so an explicit override is required to 
reduce the MSS. 


RTT values are updated by a more complicated mechanism [3], [8]. 
Dynamic RTT estimation requires a sequence of RTT measurements, even 
though a single T/TCP transaction may not accumulate enough samples. 
As a result, the cached RTT (and its variance) is an average of its 
previous value with the contents of the currently active TCB for that 
host, when a TCB is closed. RTT values are updated only when a 
connection is closed. Further, the method for averaging the RTT 
values is not the same as the method for computing the RTT values 
within a connection, so that the cached value may not be appropriate. 
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For temporal sharing, the cache requires updating only when a 
connection closes, because the cached values will not yet be used to 
initialize a new TCB. For the ensemble sharing, this is not the case, 
as discussed below. 


Other TCB variables may also be cached between sequential instances, 
such as the congestion control window information. Old cache values 
can be overwritten with the current TCB estimates, or a MAX or MIN 
function can be used to merge the results, depending on the optimism 
or pessimism of the reused values. For example, the congestion window 
can be reused if there are no concurrent connections. 


Example of Ensemble Sharing 


Sharing cached TCB data across concurrent connections requires 
attention to the aggregate nature of some of the shared state. 
Although MSS and RTT values can be shared by copying, it may not be 
appropriate to copy congestion window information. At this point, we 
present only the MSS and RTT rules: 


ENSEMBLE SHARING - TCB Initialization 


Cached TCB New TCB 
old-MSS old-MSS 
old-RTT old-RTT 
old-RTTvar old-RTTvar 


ENSEMBLE SHARING - Cache Updates 


Cached TCB Current TCB when? New Cached TCB 

old-MSS curr-MSS MSSopt curr-MSS 
old-RTT curr-RTT update rtt_update(old, curr) 
old-RTTvar curr-RTTvar update rtt_update(old, curr) 


For ensemble sharing, TCB information should be cached as early as 

possible, sometimes before a connection is closed. Otherwise, opening 
multiple concurrent connections may not result in TCB data sharing if 
no connection closes before others open. An optimistic solution would 
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be to update cached data as early as possible, rather than only when 
a connection is closing. Some T/TCP implementations do this for MSS 
when the TCP MSS header option is received [4], although it is not 
addressed specifically in the concepts or functional specification 
[2] [3]. 


In current T/TCP, RIT values are updated only after a CLOSE, which 
does not benefit concurrent sessions. As mentioned in the temporal 
case, averaging values between concurrent connections requires 
incorporating new RTT measurements. The amount of work involved in 
updating the aggregate average should be minimized, but the resulting 
value should be equivalent to having all values measured within a 
single connection. The function "rtt_update" in the ensemble sharing 
table indicates this operation, which occurs whenever the RTT would 
have been updated in the individual TCP connection. As a result, the 
cache contains the shared RTT variables, which no longer need to 
reside in the TCB [8]. 


Congestion window size aggregation is more complicated in the 
concurrent case. When there is an ensemble of connections, we need 
to decide how that ensemble would have shared the congestion window, 
in order to derive initial values for new TCBs. Because concurrent 
connections between two hosts share network paths (usually), they 
also share whatever capacity exists along that path. With regard to 
congestion, the set of connections might behave as if it were 
multiplexed prior to TCP, as if all data were part of a single 
connection. As a result, the current window sizes would maintain a 
constant sum, presuming sufficient offered load. This would go beyond 
caching to truly sharing state, as in the RIT case. 


We pause to note that any assumption of this sharing can be 
incorrect, including this one. In current implementations, new 
congestion windows are set at an initial value of one segment, so 
that the sum of the current windows is increased for any new 
connection. This can have detrimental consequences where several 
connections share a highly congested link, such as in trans-Atlantic 
Web access. 


There are several ways to initialize the congestion window in a new 
TCB among an ensemble of current connections to a host, as shown 
below. Current TCP implementations initialize it to one segment [9], 
and T/TCP hinted that it should be initialized to the old window size 
[3]. In the former, the assumption is that new connections should 
behave as conservatively as possible. In the latter, no accommodation 
is made to concurrent aggregate behavior. 


In either case, the sum of window sizes can increase, rather than 
remain constant. Another solution is to give each pending connection 
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its "fair share" of the available congestion window, and let the 
connections balance from there. The assumption we make here is that 
new connections are implicit requests for an equal share of available 
link bandwidth which should be granted at the expense of current 
connections. This may or may not be the appropriate function; we 
propose that it be examined further. 


ENSEMBLE SHARING - TCB Initialization 
Some Options for Sharing Window-size 


Cached TCB New TCB 
old-snd_ewnd = (current) one segment 
(T/TCP hint) old-snd_cwnd 
(proposed) old-snd_cwnd/ (N+1) 


subtract old-snd_cwnd/ (N+1) /N 
from each concurrent 


ENSEMBLE SHARING - Cache Updates 


Cached TCB Current TCB when? New Cached TCB 


old-snd_cwnd curr-snd_cwnd update (adjust sum as appropriate) 


Compatibility Issues 


Current TCP implementations do not use TCB caching, with the 
exception of T/TCP variants [4][7]. New connections use the default 
initial values of all non-instantiated TCB variables. As a result, 
each connection calculates its own RTT measurements, MSS value, and 
congestion information. Eventually these values are updated for each 
connection. 


For the congestion and current window information, the initial values 
may not be consistent with the long-term aggregate behavior of a set 
of concurrent connections. If a single connection has a window of 4 
segments, new connections assume initial windows of 1 segment (the 
minimum), although the current connection’s window doesn’t decrease 
to accommodate this additional load. As a result, connections can 
mutually interfere. One example of this has been seen on trans- 
Atlantic links, where concurrent connections supporting Web traffic 
can collide because their initial windows are too large, even when 
set at one segment. 
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Because this proposal attempts to anticipate the aggregate steady- 
state values of TCB state among a group or over time, it should avoid 
the transient effects of new connections. In addition, because it 
considers the ensemble and temporal properties of those aggregates, 
it should also prevent the transients of short-lived or multiple 
concurrent connections from adversely affecting the overall network 
performance. We are performing analysis and experiments to validate 
these assumptions. 


Performance Considerations 


Here we attempt to optimize transient behavior of TCP without 
modifying its long-term properties. The predominant expense is in 
maintaining the cached values, or in using per-host state rather than 
per-connection state. In cases where performance is affected, 
however, we note that the per-host information can be kept in per- 
connection copies (as done now), because with higher performance 
should come less interference between concurrent connections. 


Sharing TCB state can occur only at connection establishment and 
close (to update the cache), to minimize overhead, optimize transient 
behavior, and minimize the effect on the steady-state. It is possible 
that sharing state during a connection, as in the RTT or window-size 
variables, may be of benefit, provided its implementation cost is not 
high. 


Implications 


There are several implications to incorporating TCB interdependence 
in TCP implementations. First, it may prevent the need for 
application-layer multiplexing for performance enhancement [6]. 
Protocols like persistent-HTTP avoid connection reestablishment costs 
by serializing or multiplexing a set of per-host connections across a 
single TCP connection. This avoids TCP’s per-connection OPEN 
handshake, and also avoids recomputing MSS, RTT, and congestion 
windows. By avoiding the so-called, "slow-start restart," performance 
can be optimized. Our proposal provides the MSS, RTT, and OPEN 
handshake avoidance of T/TCP, and the "slow-start restart avoidance" 
of multiplexing, without requiring a multiplexing mechanism at the 
application layer. This multiplexing will be complicated when 
quality-of-service mechanisms (e.g., "integrated services 
scheduling") are provided later. 


Second, we are attempting to push some of the TCP implementation from 
the traditional transport layer (in the ISO model [10]), to the 
network layer. This acknowledges that some state currently maintained 
as per-connection is in fact per-path, which we simplify as per- 
host-pair. Transport protocols typically manage per-application-pair 
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associations (per stream), and network protocols manage per-path 
associations (routing). Round-trip time, MSS, and congestion 
information is more appropriately handled in a network-layer fashion, 
aggregated among concurrent connections, and shared across connection 
instances. 


An earlier version of RTT sharing suggested implementing RTT state at 
the IP layer, rather than at the TCP layer [8]. Our observations are 
for sharing state among TCP connections, which avoids some of the 
difficulties in an IP-layer solution. One such problem is determining 
the associated prior outgoing packet for an incoming packet, to infer 
RTT from the exchange. Because RTTs are still determined inside the 
TCP layer, this is simpler than at the IP layer. This is a case where 
information should be computed at the transport layer, but shared at 
the network layer. 


We also note that per-host-pair associations are not the limit of 
these techniques. It is possible that TCBs could be similarly shared 
between hosts on a LAN, because the predominant path can be LAN-LAN, 
rather than host-host. 


There may be other information that can be shared between concurrent 
connections. For example, knowing that another connection has just 
tried to expand its window size and failed, a connection may not 
attempt to do the same for some period. The idea is that existing TCP 
implementations infer the behavior of all competing connections, 
including those within the same host or LAN. One possible 
optimization is to make that implicit feedback explicit, via extended 
information in the per-host TCP area. 


Security Considerations 


These suggested implementation enhancements do not have additional 
ramifications for direct attacks. These enhancements may be 
susceptible to denial-of-service attacks if not otherwise secured. 
For example, an application can open a connection and set its window 
size to 0, denying service to any other subsequent connection between 
those hosts. 


TCB sharing may be susceptible to denial-of-service attacks, wherever 
the TCB is shared, between connections in a single host, or between 
hosts if TCB sharing is implemented on the LAN (see Implications 
section). Some shared TCB parameters are used only to create new 
TCBs, others are shared among the TCBs of ongoing connections. New 
connections can join the ongoing set, e.g., to optimize send window 
size among a set of connections to the same host. 
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Attacks on parameters used only for initialization affect only the 
transient performance of a TCP connection. For short connections, 
the performance ramification can approach that of a denial-of-service 
attack. E.g., if an application changes its TCB to have a false and 
small window size, subsequent connections would experience 
performance degradation until their window grew appropriately. 


The solution is to limit the effect of compromised TCB values. TCBs 
are compromised when they are modified directly by an application or 
transmitted between hosts via unauthenticated means (e.g., by using a 
dirty flag). TCBs that are not compromised by application 
modification do not have any unique security ramifications. Note that 
the proposed parameters for TCB sharing are not currently modifiable 
by an application. 


All shared TCBs MUST be validated against default minimum parameters 
before used for new connections. This validation would not impact 
performance, because it occurs only at TCB initialization. This 
limits the effect of attacks on new connections, to reducing the 
benefit of TCB sharing, resulting in the current default TCP 
performance. For ongoing connections, the effect of incoming packets 
on shared information should be both limited and validated against 
constraints before use. This is a beneficial precaution for existing 
TCP implementations as well. 


TCBs modified by an application SHOULD not be shared, unless the new 
connection sharing the compromised information has been given 
explicit permission to use such information by the connection API. No 
mechanism for that indication currently exists, but it could be 
supported by an augmented API. This sharing restriction SHOULD be 
implemented in both the host and the LAN. Sharing on a LAN SHOULD 
utilize authentication to prevent undetected tampering of shared TCB 
parameters. These restrictions limit the security impact of modified 
TCBs both for connection initialization and for ongoing connections. 


Finally, shared values MUST be limited to performance factors only. 
Other information, such as TCP sequence numbers, when shared, are 
already known to compromise security. 
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